Skip to content

docs(deepnsm): psychometric validation framework + vertical HHTL bundling spec#39

Merged
AdaWorldAPI merged 4 commits into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z
Mar 28, 2026
Merged

docs(deepnsm): psychometric validation framework + vertical HHTL bundling spec#39
AdaWorldAPI merged 4 commits into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

docs(deepnsm): psychometric validation framework + vertical HHTL bundling spec

Two architectural concepts saved for dedicated implementation sessions:

  1. Psychometric validation for DeepNSM measurement instrument:

    • Cronbach's α across 128 projections (2³ SPO × 2⁴ HHTL)
    • Split-half reliability: Strategy A vs Strategy B distance
    • IRT item parameters: per-word difficulty + discrimination
    • Factor analysis: do 74 primes factor into 16 NsmCategory?
    • Construct/convergent/discriminant validity across codec chain
    • Polysemy detection via α drop across projections
    • P-values with 128 independent measurements per pair
  2. Vertical HHTL bundling (studio mixing analogy):

    • Leaves → bundle → Twigs → bundle → Branches → bundle → Hip
    • Each level = majority vote denoising (background noise removal)
    • Unbind bottom-up to verify reconstruction (information loss audit)
    • Combined SPO × HHTL = 128-way factorial decomposition
    • Cascade as psychometric filter: discrimination, factor analysis,
      composite reliability, SEM, residual analysis

Key insight: NARS confidence IS measurement reliability (formalized).
Every similarity judgment gets a confidence interval backed by
128 independent projection measurements.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

claude added 4 commits March 28, 2026 19:10
…231 lines)

The expansion that was deferred since session start. Adds:

Evaluation types (transcoded from Python nsm_evaluation.py + prompts.py):
  - Prediction: grader output with logprob, rank, match status
  - SubstitutabilityScore: per-grader scoring with minimality + entailment deltas
  - Explication: NSM paraphrase with legality_score() (primes/molecules/circularity)
    + calculate_averages() + get_truncated()
  - AmbiguousExample: masked passage with get_truncated() (removes non-UNK sentences)
  - ModelResult: aggregated evaluation across all explications

Static sets via LazyLock (Rust 1.94):
  - NSM_PRIMES_SET: 78 primes including multi-word ("a long time", "don't want")
  - STOP_WORDS: English stopwords minus NSM primes (one-time filtered)
  - is_nsm_prime(), is_stop_word(), LEGAL_PUNCTUATION

CAM-PQ bridge:
  - load_nsm_codebook(): codebook_pq.bin → CamCodebook (96KB, [6][256][16] f32)
  - load_cam_codes(): cam_codes.bin → Vec<CamFingerprint> (5050 × 6 bytes)

36-bit SPO triple:
  - SpoTriple: 12-bit subject + predicate + object packed in u64
  - new(), subject(), predicate(), object()

Prompt templates + builders:
  - NSM_EXPLICATION_SYS_INST, RECOVERY_PROMPT_SYS_INST
  - build_explication_prompt() with few-shot support
  - build_recover_prompt() with optional explication hint

23 tests passing (12 original + 11 new).

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
All consumer code uses crate::simd only. Zero raw intrinsics.
LazyLock dispatch table selects AVX-512 vs AVX2 at startup.

cam_pq.rs — squared_l2():
  - Called 1,536× per CAM-PQ query (6 subspaces × 256 centroids)
  - Was: scalar iter().zip().map().sum()
  - Now: F32x16 for 16D subvectors (one SIMD lane = one subspace dimension)
  - Fast path: n==16 → single load-subtract-multiply-reduce
  - Medium path: n>=16 → chunked F32x16 with mul_add + scalar remainder
  - Estimated 16× speedup on hot path

deepnsm.rs — nsm_decompose() normalization:
  - Was: scalar iter().sum() + scalar /= loop
  - Now: F32x16 accumulation (4×16=64 elements) + scalar remainder (10)
  - Normalize via F32x16 * splat(1/sum) + scalar tail

deepnsm.rs — nsm_to_fingerprint() XOR:
  - Was: scalar for j in 0..1250 { result[j] ^= pattern[j] }
  - Now: U8x64 XOR (19×64=1216 bytes) + scalar remainder (34 bytes)
  - 64 bytes per SIMD operation vs 1 byte scalar

deepnsm.rs — nsm_similarity() cosine:
  - Was: scalar 3-accumulator loop over 74 elements
  - Now: F32x16 with mul_add for dot/mag_a/mag_b (4×16=64) + scalar tail (10)
  - Three reductions in one pass

23 deepnsm tests + 7 dispatch tests passing. Zero regressions.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
Document the category-padded SoA layout (16 categories × 16 slots = 256
F32x16 lanes) as a future optimization concept in deepnsm.rs.

Verified no overlap with existing patterns:
  - blasgraph CSR/CSC: graph adjacency matrix, not semantic vectors
  - SPO semiring: cost algebra, not vector layout
  - neighborhood CLAM: search scope, not decomposition format
  - aabb/spatial_hash SoA: spatial coords (x,y,z), not semantic categories
  - dn_tree SoA: HV summary layout, not NSM category decomposition

The concept is clean for future implementation.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
…ling spec

Two architectural concepts saved for dedicated implementation sessions:

1. Psychometric validation for DeepNSM measurement instrument:
   - Cronbach's α across 128 projections (2³ SPO × 2⁴ HHTL)
   - Split-half reliability: Strategy A vs Strategy B distance
   - IRT item parameters: per-word difficulty + discrimination
   - Factor analysis: do 74 primes factor into 16 NsmCategory?
   - Construct/convergent/discriminant validity across codec chain
   - Polysemy detection via α drop across projections
   - P-values with 128 independent measurements per pair

2. Vertical HHTL bundling (studio mixing analogy):
   - Leaves → bundle → Twigs → bundle → Branches → bundle → Hip
   - Each level = majority vote denoising (background noise removal)
   - Unbind bottom-up to verify reconstruction (information loss audit)
   - Combined SPO × HHTL = 128-way factorial decomposition
   - Cascade as psychometric filter: discrimination, factor analysis,
     composite reliability, SEM, residual analysis

Key insight: NARS confidence IS measurement reliability (formalized).
Every similarity judgment gets a confidence interval backed by
128 independent projection measurements.

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7
@AdaWorldAPI AdaWorldAPI merged commit 9ab2d43 into master Mar 28, 2026
5 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants